Skip to content

fix(iluvatar): compile bindings without CMake CUDA language#613

Merged
voltjia merged 1 commit into
masterfrom
fix/iluvatar-binding-build
May 20, 2026
Merged

fix(iluvatar): compile bindings without CMake CUDA language#613
voltjia merged 1 commit into
masterfrom
fix/iluvatar-binding-build

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

@zhangyue207 zhangyue207 commented May 19, 2026

Summary

  • Keep Iluvatar builds out of CMake's CUDA language path so CoreX clang++ is not invoked with CMake-added -x cuda for generated binding dispatch sources.
  • Compile only Iluvatar generated dispatch binding sources through explicit CoreX clang++ -x ivcore custom commands, while leaving NVIDIA and other backend paths unchanged.
  • Emit depfiles for the explicit Iluvatar dispatch compile commands so changes to included generated headers and Iluvatar kernel headers trigger correct incremental rebuilds.

Motivation

The generated .cc binding dispatch sources were forced onto CMake's CUDA source path for both NVIDIA and Iluvatar. On Iluvatar, CMake's CUDA language flow appends -x cuda, which can make CoreX clang++ segfault. This PR keeps Iluvatar on the working CoreX -x ivcore compile path without changing other hardware platforms.

Because the Iluvatar dispatch sources now use explicit custom commands instead of CMake's normal source dependency scanner, the custom commands also write Ninja depfiles with -MMD -MF and DEPFILE. This preserves incremental-build correctness when included generated headers or Iluvatar implementation headers change.

Closes N/A

Type of Change

  • fix — bug fix
  • feat — new feature / new operator / new platform
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA Yes CI passed ci / unit / nvidia and ci-v2-shadow / ci-v2-shadow / nvidia passed. NVIDIA CUDA source handling remains unchanged.
Iluvatar Yes CI passed ci / unit / iluvatar and ci-v2-shadow / ci-v2-shadow / iluvatar passed. Also verified local compile and wheel install in infiniops-dev/iluvatar:latest with CoreX host mount.
MetaX Yes CI passed ci / unit / metax and ci-v2-shadow / ci-v2-shadow / metax passed. Not affected by this PR.
Cambricon Yes CI passed ci / unit / cambricon and ci-v2-shadow / ci-v2-shadow / cambricon passed. Not affected by this PR.
Moore Yes CI passed ci / unit / moore and ci-v2-shadow / ci-v2-shadow / moore passed. Not affected by this PR.
Ascend No CI failed ci / unit / ascend and ci-v2-shadow / ci-v2-shadow / ascend failed. Ascend is not changed by this PR.
GitHub Actions status

As of the latest PR check query on 2026-05-20 UTC:

SUCCESS  clang-format
SUCCESS  ruff
SUCCESS  ci / Generate matrix from config
SUCCESS  ci-v2-shadow / Generate CI v2 shadow matrix
SUCCESS  ci / unit / nvidia
SUCCESS  ci-v2-shadow / ci-v2-shadow / nvidia
SUCCESS  ci / unit / iluvatar
SUCCESS  ci-v2-shadow / ci-v2-shadow / iluvatar
SUCCESS  ci / unit / metax
SUCCESS  ci-v2-shadow / ci-v2-shadow / metax
SUCCESS  ci / unit / moore
SUCCESS  ci-v2-shadow / ci-v2-shadow / moore
SUCCESS  ci / unit / cambricon
SUCCESS  ci-v2-shadow / ci-v2-shadow / cambricon
FAILURE  ci / unit / ascend
FAILURE  ci-v2-shadow / ci-v2-shadow / ascend
FAILURE  ci / Fail queued CI jobs after 10 minutes
FAILURE  ci-v2-shadow / Fail queued CI v2 jobs after 10 minutes
Local Iluvatar build output summary
cmake -S . -B /tmp/infiniops-iluvatar-bindings -G Ninja -DWITH_ILUVATAR=ON -DAUTO_DETECT_DEVICES=OFF -DAUTO_DETECT_BACKENDS=ON -DGENERATE_PYTHON_BINDINGS=ON
cmake --build /tmp/infiniops-iluvatar-bindings -j$(nproc)
# Completed: [198/198] Linking CXX shared module src/ops.cpython-310-x86_64-linux-gnu.so

python -m pip install . --no-build-isolation --no-deps
# Completed: built and installed infiniops-0.1.0-cp310-cp310-linux_x86_64.whl

cmake -S . -B /tmp/infiniops-iluvatar-depfile-policy -G Ninja -DWITH_ILUVATAR=ON -DAUTO_DETECT_DEVICES=OFF -DAUTO_DETECT_BACKENDS=ON -DGENERATE_PYTHON_BINDINGS=ON
# Completed without CMP0116 warnings; build.ninja contains CoreX generated_dispatch_0.o.d depfile handling.

Benchmark / Performance Impact

N/A. This is a build-path fix only.

Notes for Reviewers

The intended invariant is that NVIDIA still uses CMake LANGUAGE CUDA for generated binding sources, while Iluvatar uses explicit custom commands only for generated_dispatch*.cc. Ordinary generated pybind sources remain CXX for Iluvatar. The explicit Iluvatar dispatch commands emit compiler depfiles so header and kernel implementation changes are tracked by Ninja incremental builds.


Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz.
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit.
  • No stray merge commits from master; branch is based on origin/master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal and limited to CMakeLists.txt and src/CMakeLists.txt.
  • No dead code, commented-out blocks, debug prints, or ownerless TODOs were added.
  • No unrelated formatting churn was introduced.
  • Public API changes: N/A, no public API is changed.

General Code Hygiene

  • Comments were added only to explain the non-obvious Iluvatar compiler path.
  • Modified files end with a trailing newline.
  • git diff --check HEAD~1 HEAD passes.
  • Comments and error messages are in English.
  • Comments and error messages are complete sentences where applicable.

C++ Specific

N/A. No C++ source/header files were changed.

Python Specific

N/A. No Python files were changed.

Testing

  • Iluvatar build was run locally and recorded above.
  • GitHub Actions status was recorded above.
  • Platforms not changed by this PR are marked as not affected in the notes.
  • N/A: No new operator functionality was added.
  • N/A: No new tests were added.
  • N/A: This is a build-system regression fix; the regression is covered by the Iluvatar binding build command above.

Build, CI, and Tooling

  • The project builds cleanly from a fresh CMake directory for Iluvatar with generated Python bindings.
  • pip install . --no-build-isolation --no-deps succeeds in the Iluvatar development image.
  • Ninja depfile generation for Iluvatar dispatch objects is present in build.ninja.
  • GitHub Actions clang-format, ruff, matrix generation, and unit jobs for NVIDIA, Iluvatar, MetaX, Moore, and Cambricon passed.
  • GitHub Actions Ascend jobs passed. Current result: failed; Ascend is not changed by this PR.
  • N/A: No new backend/device was added.
  • Existing CUDA-like backend mutual exclusion is unchanged.
  • N/A: clang-format.yml and ruff.yml do not apply to the modified CMake files.
  • No new runtime dependency was added.

Documentation

  • N/A: No user-visible API, operator, or public utility behavior changed.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • N/A: No third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@zhangyue207 zhangyue207 requested a review from a team May 19, 2026 05:48
@zhangyue207
Copy link
Copy Markdown
Collaborator Author

Ascend ci 在部分几个测例报错,在 pr #614 中有修复

bitzyz
bitzyz previously approved these changes May 20, 2026
@zhangyue207 zhangyue207 force-pushed the fix/iluvatar-binding-build branch from 4d1bbdc to 1ee76bf Compare May 20, 2026 03:43
@voltjia voltjia merged commit 2a2375a into master May 20, 2026
16 of 20 checks passed
@voltjia voltjia deleted the fix/iluvatar-binding-build branch May 20, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants